Exploiting Unlabeled Data Using Improved Natural Langua
نویسندگان
چکیده
This paper presents an unsupervised method that uses limited amount of labeled data and a large pool of unlabeled data to improve natural language call routing performance. The method uses multiple classifiers to select a subset of the unlabeled data to augment limited labeled data. We evaluated four widely used text classification algorithms; Naive Bayes Classification (NBC), Support Vector machines (SVM), Boosting and Maximum Entropy (MaxEnt). The NBC method is found to be poorest performer compared to other three classification methods. Combining SVM, Boosting and MaxEnt resulted in significant improvements in call classification accuracy compared to any single classifier performance across varying amounts of labeled data.
منابع مشابه
Semi-supervised Relation Extraction using EM Algorithm
Relation Extraction is the task of identifying relation between entities in a natural language sentence. We propose a semisupervised approach for relation extraction based on EM algorithm, which uses few relation labeled seed examples and a large number of unlabeled examples (but labeled with entities). We present analysis of how unlabeled data helps in improving the overall accuracy compared t...
متن کاملDiscovery of Informative Unlabeled Data for Improved Learning
In computer vision, the acquisition of sufficient labeled data for training is often time-consuming. However, unlabeled data are conveniently available. The key problem is to discover and incorporate those informative and confidently predicted unlabeled data into the training set for improved learning. In this paper, we discover such unlabeled data by exploiting the locality property of the dat...
متن کاملCombining Labeled and Unlabeled Data for Learning Cross-Document Structural Relationships
Multi-document discourse analysis has emerged with the potential of improving various NLP applications. Based on the newly proposed Cross-document Structure Theory (CST), this paper describes an empirical study that classifies CST relationships between sentence pairs extracted from topically related documents, exploiting both labeled and unlabeled data. We investigate a binary classifier for de...
متن کاملWord Sense Disambiguation by Learning from Unlabeled Data
Most corpus-based approaches to natural language processing su er from lack of training data. This is because acquiring a large number of labeled data is expensive. This paper describes a learning method that exploits unlabeled data to tackle data sparseness problem. The method uses committee learning to predict the labels of unlabeled data that augment the existing training data. Our experimen...
متن کاملOne Class per Named Entity: Exploiting Unlabeled Text for Named Entity Recognition
In this paper, we present a simple yet novel method of exploiting unlabeled text to further improve the accuracy of a high-performance state-of-theart named entity recognition (NER) system. The method utilizes the empirical property that many named entities occur in one name class only. Using only unlabeled text as the additional resource, our improved NER system achieves an F1 score of 87.13%,...
متن کامل